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ABSTRACT 

Recommender systems are emerging technologies that nowa- 
days can be found in many applications such as Amazon, 
Netflix, and so on. These systems help users to find relevant 
information, recommendations, and their preferred items. 
Slightly improvement of the accuracy of these recommenders 
can highly affect the quality of recommendations. 

Matrix Factorization is a popular method in Recommen- 
dation Systems showing promising results in accuracy and 
complexity. In this paper we propose an extension of matrix 
factorization which adds general neighborhood information 
on the recommendation model. Users and items are clus- 
tered into different categories to see how these categories 
share preferences. We then employ these shared interests 
of categories in a fusion by Biased Matrix Factorization to 
achieve more accurate recommendations. This is a comple- 
ment for the current neighborhood aware matrix factoriza- 
tion models which rely on using direct neighborhood infor- 
mation of users and items. The proposed model is tested on 
two well-known recommendation system datasets: Movie- 
lenslOOk and Netflix. Our experiment shows applying the 
general latent features of categories into factorized recom- 
mender models improves the accuracy of recommendations. 
The current neighborhood-aware models need a great num- 
ber of neighbors to acheive good accuracies. To the best of 
our knowledge, the proposed model is better than or com- 
parable with the current neighborhood-aware models when 
they consider fewer number of neighbors. 

Categories and Subject Descriptors 

[ Information systems]: Data Mining, Collaborative Fil- 
tering, Clustering 

General Terms 

Recommendation Systems 
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1. INTRODUCTION 

Recommender systems are emerging technologies that can 
be found in many present-day applications. Netflbj^] recom- 
mends users a list of movies that they may be interested in; 
Google New^] tracks the news that users are following and 
gives a list of recommended articles; and AmazorJ^] suggests 
everything from books to magazines. All of these recom- 
mendations come from engines that call recommendation 
systems. These are systems that find relevant information 
and recommendations for users based on located users' pref- 
erences (news, books, movies, music, etc. ). For example, 
figure [l] presents an artificial dataset which includes 5 users 
who rate newly released movies. The solid arrows reflect 
the users' preferences and the dashed arrows reflect users' 
non-preferences on these items. The problem can be defined 
as predicting unknown users' preferences based on known 
users' preferences, and profiles of items and users. Thus, 
improving the accuracy of these predictions is one of the 
main goals in recommendation systems [2]. 

As described in [5], even slightly improvement of accu- 
racy can highly affect the quality of recommendations. In 
the literature, many methods usually are blended to achieve 
better accuracy of recommendations. These methods in- 
clude different parameters and information (time, neighbor- 
hood, context, content, and so on) in the recommendation 
process. Thus, deducing and adding new information into 
current models may improve the blended model's accuracy. 

Collaborative filtering is a well-known solution in recom- 
mendation systems that uses the known preferences for mod- 
eling this recommendation problem. It relates two items 
based on the fact that many users have purchased or rated 
those items, or it relates two users based on their similar 
purchases or preferences. This solution is in contrast to the 
content filtering solution. Models based on content filtering 
focus on analyzing sets of user profiles and product features 
to find similarities between users and items, and employ 
these similarities for the recommendation purpose. 

Neighborhood-aware collaborative filtering methods estab- 

1 http:/ /www. netflix. com 
2 http:/ /news. google. com 
3 http:/ /www. amazon.com 



lish the known preferences to calculates similarities between 
users and finds the h most similar users to every user (it 
does same for items). They then deduce an unknown rat- 
ing of user u for an item i, r U i, considering ratings of that 
item by the h highest similar users to user u. Or, consider- 
ing ratings that user u has revealed on the h highest similar 
items to item i [8]. For example, in figure [l] this approach 
finds user 2 and 3 as the most similar users to user 1 and 
then applies their preferences on item Madagascar to pre- 
dict the unknown preference of userl on item Madagascar. 
However, they have to employ large selection of h to achieve 
good accuracies. 



Users Items 




Figure f: Finding categories among users and items based 
on their behaviors in an artificial dataset. 



Matrix factorization [7J [8] ) is a common dimension reduc- 
tion algorithm that generalizes users' preferences and items' 
histories in a very limited number of latent features. It uses 
these latent features to predict possible preferences or rat- 
ings of users on items. Matrix factorization works based on 
the singular value decomposition (SVD) but in a lower di- 
mension. It is a good solution to deal with the complexity of 
finding relations as well as generalizing them. There are two 
common approaches for applying neighborhood information 
into factorized models which can be described in the afore- 
mentioned example as follows: 1) by predicting the possible 
ratings of most similar users to userl on item Madagascar 
based on their latent vectors. This is then used to predict 
r ui (by weighted averaging), or 2) by using matrix factor- 
ization technique similarities are found between users and 
items with less cost and more generality. It then uses these 
similarities (similarity between user 1, 2, and 3, and items 
IceAge, Brave, Madagascar) and known preferences to pre- 
dict the unknown preference. Neighborhood aware matrix 
factorization also has shown a good prediction accuracy on 
the Netfiix Prize competition [8] [6] . 

Inspired by this idea, in our proposed model we try to find 
the categories between users and items and then consider the 
way that users inside these categories have implicitly rated 



the categories of items in average. It has two advantages 
opposed to the common neighborhood models: 

1. It generalizes the users' interests and items' features 
for possible categories that they belong to. For ex- 
ample, user 1 may belongs to category Adults and 
item Madagascar may belongs to category Cartoons. 
Thus, in addition to considering if userl is interested 
in item Madagascar, we deliberate if category Adults 
shares same preferences with category Cartoons in 
general. 

2. It considers deeper similarities. Item-item models check 
if user 1 is interested in item Madagascar and its sim- 
ilar items {IceAge and Brave). User- user models also 
check if user 1 and its similar users (2 and 3) are in- 
terested in item Madagascar. In our approach, we 
check to see if user 1, 2 and 3 are interested in items 
Madagascar, IceAge and Brave. 

Thus, the current neighborhood aware models likely predict 
a high interest of user 1 on item Madagascar because of a 
high similarity between user 3 and 1. But in a more gen- 
eral view they belong to the categories that do not share 
many interests [Adults on Cartoons). Hence in our pro- 
posed model, we first try to find these categories (Adults, 
Cartoons, Drama, and so on.) and then consider the effect 
of these categories' behaviors beside the behavior of each 
user and item in an extension of biased matrix factoriza- 
tion. However, these categories are not usually easily sep- 
arable and understandable (like Drama and Adults in our 
example) by using collaborative information. 

We use Root Mean Squared Error (RMSE) as the accuracy 
measure through two well-known datasets including Netfiix, 
and MovieLenslOOK which are previously used in many pub- 
lications such as [7] , [8] , and [3] . Our experiment shows that 
the proposed method improves the biased matrix factoriza- 
tion. However, the information of categories is too general to 
substitute explicit preferences. Hence, it can be considered 
as new deduced information that can complement current 
factorized, and neighborhood-aware models by adding more 
generality and deeper similarities. Also, in section 4.2 it is 
shown that how improving the clustering quality by adding 
contents will result more accurate recommendations. 

2. PROBLEM DEFINITION 

In a general recommendation problem, we have a set of 
users U = {u\,U2, . . . ,u n } and a set of items I — {ii, 12, . . • , im} 
that they are companied by a rating matrix R = [r U i] nxm 
where r U i represents the rating of user u on item i. This 
rating values are limited in a range [a, b] depending on ap- 
plications but a range [0,5] or a boolean value {0,1} are 
usually quite common forms of rating in real-world applica- 
tions. 

Collaborating filtering consists of predicting unknown r;jS 
based on the known 7V 3 -/ s inside the matrix R: 
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Matrix Factorization addresses this problem by decompos- 
ing the ratings matrix, R, into two lower dimension matrices 
Q and P which contain corresponding vectors in the length 
of k for every item and user respectively, where k <C m,n. 
Such a model is close to the singular value decomposition 
(SVD) technique for finding latent vectors in information 
retrieval [7]. 
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The learning algorithm starts by a random initialization 
of matrices Q and P. In every learning step it then tries to 
change the initialized variables in the way that Q T P con- 
verges to the known values of R. In the prediction case, the 
product of learned matrices will be used to predict unknown 

TijS. 

Thus, Matrix Factorization characterizes every user and 
item by corresponding them a latent vector. Assume qt £ R fc 
as the corresponding vector for item i and p u £ R fe as the 
corresponding vector for user u. It is supposed that the dot 
product of the user-item vectors results the user's rating on 
the item: 



Algorithm 1 Biased Matrix Factorization's updating for- 
mula 

Input: Train set K 

Initializing matrices P, Q and the biases 
repeat 

for every known r u t in the train set K do 

qi qi +^{e ui .p u — Xi.cfc) 

Pu «- Pu + l(e ul .q t - \i_.pu) 

hi bi + j(e u i — M-bi) 

b u <- b u + j(e U i — X 2 .b u ) 
end for 

until for limited number of epochs 



a novel way. There are two common approaches in neigh- 
borhood aware matrix factorization models: 1) item — item 
models which consider if user u is interested in item i and 
its similar items. 2) user — user models that consider if user 
u and its similar users are interested in item i. In literature, 
a fusion of both item — item and user — user models usually 
has been applied in the final predictor function to achieve 
finer accuracy. However, they ignore the neighborhood in- 
formation of possible categories that items and users may 
belong to. Thus, we first try to find these possible cate- 
gories and then consider the shared interests between these 
categories in the basic user-item model. As discussed earlier, 
it has two advantages opposed to the common neighborhood 
models: 



Also, a bias value usually is corresponded to each user and 
item to reflect their mean ratings. Adding the biases, the 
above statement will change to: 



It generalizes the users' interests and items' features 
for the categories that they may belong to. Thus, we 
consider if general interests of users are matched with 
general features of items. 



r m = q t Pu + h + b u 

Let's define the error as the actual rating minus the pre- 
dicted value in each step, e U i- In the learning phase, the goal 
will be minimization of the square error of perditions. Regu- 
larizer values prevent over-fitting on the model and keep the 
latent values small. The minimization function is as follows: 



min( ( r ui - qfpu - bl) 2 



+A 1 .(M 2 + | Pll | 2 ) + A2.(|6 11 | 2 + N 2 )) 

Algorithm 1 presents the learning process of biased matrix 
factorization using a stochastic gradient descent technique 
giving a learning rate 7, and regularizing rates Ai and A2. 



3. CLUSTERING-BASED MATRIX FACTOR- 
IZATION 

In our proposed extension, we add neighborhood infor- 
mation on the basic biased matrix factorization model in 



It considers deeper similarities. More specifically, it 
considers if user u and its similar users are interested 
in item i and its similar items. 



We employ Kmeans clustering method to find possible 
categories of items and users. We first use the biased matrix 
factorization on the train set to find the latent vectors of 
each user and item. Kmeans then is applied on these latent 
vectors with different selection of K (number of clusters) to 
find possible categories of items and users. Employing la- 
tent vectors to cluster sparse data has been successfully used 
in the literature such as 10 . Rating matrices are usually 
sparse in recommendation systems context. Hence, using 



latent vectors helps to reduce the complexity of clustering 
these large and sparse datasets. 

After finding these categories of items and users, we then 
correspond every cluster (category) a latent vector com- 
panied by a biased value same as basic biased matrix fac- 
torization. We consider the ratings between these categories 
in a new matrix R* . In the new rating matrix, every r* Cu Ci 
reflects the average rating of the users inside the category 
that u belong to on the items inside the category of i, as 
follows: 



4. EXPERIMENT 
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where n' < n and m! < m are the number of categories of 
users and items respectively. Hence, the predictor function 
of clusters for every rc u C . using biased matrix factorization 
technique would be as: 



r c , „,c l = 1c z Pc u + bet +bc u (1) 

d and C u are the found categories for item i and user 
u, and qd and pc„ are the corresponded latent vectors of 
these categories. These latent vectors are learned using same 
stochastic gradient descent technique that is described in 
algorithm [T] Thus, instead of predicting a rating for pairs 
of users and items, it predicts a rating for their categories. 

However, the shared preferences of categories are too gen- 
eral to be employed lonely for prediction purpose. Thus, 
we use a fusion of basic biased matrix factorization model 
with clustering-based biased matrix factorization model in 
the final predictor function as follows: 

r„ i /iM, = (l-a)r„ i + <, . (2) 

< a < 1 controls the effect of both models in the final 
predictor function. We name this fusion form as Clustering- 
Based Matrix Factorization (CBMF) in our experimental 
results. The clusters' latent vectors with users' and items' 
latent vectors are learned simultaneously regarding the er- 
ror function e u / mal which is the actual rating minus the 
r U i^ nal ■ Algorithm [2] presents the learning process of our 
proposed model. 



Algorithm 2 Our proposed CBMF's updating formula 
Input: Train set K 

Initializing matrices P, Q, Pc, Qc and the biases 
repeat 

for every known r u i in the train set K do 

g* <- g* +-y(e U i final -Pn - Xi-qt) 
Pu ^Pu+ l{e ul flnal .q t - Xi.p u ) 

b u <- b u +l(e U i final - \2-bu) 
qc, <- qc, + j(eui fmal -pc u ~ Ai.gcJ 
Pc u <- pc u + l{e ui Sinal .q Ci - Ai.pcJ 
b C% ^ 6c, + l{eJ' mal - A2.60J 
bc u <- b Cu + >y(e ul fmal - \2.bcJ 

end for 

until for limited number of epochs 



We setup our experiment on MovieLenslOOk and Net- 
Six dataset^] MovieLenslOOk data set is collected by the 
GroupLens Research Project at the University of Minnesota. 
It contains 100,000 ratings from 943 users on 1682 movies 
where each user has rated at least 20 movies [3]. The pack- 
age includes five randomly 80%/20% splits of dataset into 
training and test sets. We employ these training and test 
sets provided in the package (ul, u2, .., u5) in our evalua- 
tion. Netflix dataset contains over 100 million ratings from 
480189 users who has rated 17770 movies. In both datasets 
ratings are in a range of [1,5]. Both datasets are very sparse 
as we know only 1% of ratings and 99% of ratings are un- 
known. We run each algorithm 5 times on the datasets to 
remove the effect of random initialization of latent vectors 
on the predictions. Thus, our reported RMSE results are 
the average RMSEs of these 5 runs. 

We start by applying biased matrix factorization on both 
datasets to find their users' and items' latent vectors. We 
employ 7 = 0.005, Ai = 0.035, and A 2 = 0.0001 in the 
learning process of BMF for the both datasets. These latent 
vectors (learned on the train sets) then are used for the 
clustering purpose. Kmeans is applied on the found latent 
vectors to find possible categories of items and categories 
of users inside the datasets. [7] shows how similar latent 
vectors represent similar movies in the Netflix datasets. We 
guess different selection of possible categories by changing 
in the number of clusters. 

After guessing the categories, a latent vector companied 
by a bias value are corresponded to each category. We start 
by random initialization of these parameters, and they then 
are learned with respect to the average ratings of the users 
inside a category on the items inside different categories. 
We use 7° = 0.0002, and Af = \% = 0.1 in the learning 
process for the Netflix dataset (60 epochs), and 7° = 0.005, 
Af = 0.01, and Af = 0.001 for the MovielenslOOk dataset 
(50 epochs). 

Figure [2] illustrates a comparison between the RMSE re- 
sults of the biased matrix factorization and our proposed 
clustering-based matrix factorization. As is shown, our method 
outperforms the Biased Matrix Factorization in the both 
datasets. Let us remind that even small improvement in the 
prediction's accuracy can highly affects the quality of rec- 
ommendations [5]. Our proposed clustering-based matrix 
factorization achieves a RMSE of 0.90667 for the Movie- 
LenslOOk dataset, and RMSE of 0.90296 for the Netflix 
dataset (k=120). Increasing in the dimension of latent vec- 
tors improves the predictions' accuracy. For instance, by 
increasing k to 600 the accuracy of the model improves to 
0.90098 for the Netflix dataset. As is discussed in subsec- 
tion 4.2, our model has less learning complexity against well- 
known neighborhood aware recommendation models. Thus, 
increasing k has a less effect on the learning time of our 
proposed model. 

It is hard to accurately compare our results with the cur- 
rent neighborhood aware recommender models as: 

1. Different learning parameters and initializations. These 
models are sensitive to the parameters and initializa- 
tion that are employed in the learning process. For 
instance, by changing the learning parameters (7 = 

1 The implementation package is publicly accessible at: 
https:/ /sites. google.com/site/nmirbakhsh/projects/CBMF 



0.004, Ai = 0.02, A 2 = 0.001, and k = 240) we expe- 
rience a significant improvement of RMSE in both bi- 
ased matrix factorization and clustering-based matrix 
factorization. Employing these parameters the RMSE 
falls to 0.8996 for BMF, and 0.8992 for CBMF. 

2. Employing a greater number of neighbors in the rat- 
ing prediction process for each user and item. We do 
not expect to improve the current neighborhood aware 
matrix factorization models such as the proposed fac- 
torized model in [El. Those models employ direct sim- 
ilarities of a large number of users and items. For 
example, the Asymmetric-SVD model [EJ achieves a 
RMSE of 0.9139 for applying 250 neighbors of items 
on the Netflix dataset. This RMSE result then falls 
to 0.9002 by applying the effect of whole set of items 
(17770 items). The RMSE of the integrated factor- 
ized model proposed in [5] is 0.8953 by applying the 
effect of all 17770 items in the model. Although, by 
employing greater number of neighbors the complexity 
of model's learning time will increase linearly. 

However, our achieved accuracy is still better than those 
models when they are employing fewer number of neighbors. 
Or, it is significantly better than the "WgtNgbr" ( RMSE of 
almost 0.911 ), and correlation-based neighborhood model 
(RMSE of 0.9406) reported in [BJ. On the other hand, several 
recommendation methods are blended to achieve more ac- 
curate results in the literature. As we employ novel deduced 
information from categories in our recommender model, we 
expect that adding our proposed model in the blended model 
will improve it. 

Our proposed model has two important parameters to be 
determined: a, and the number of clusters. 

4.1 Selecting the Alpha Value 

Figure [3] shows how the accuracy of our model changes 
with different selection of a. Starting from a = 0.0, this 
model does not consider the effect of clusters so the RMSE 
is same as the biased matrix factorization as expected. For 
a — 1.0, or only considering the effect of categories, the 
RMSE is 0.98970 for the Netflix dataset and is 1.00263 for 
the MovielenslOOk dataset. It shows that employing the 
categories' information alone is not effective for the recom- 
mendation purpose. 

A validation set is used to determine the best selection 
of a. As is shown in figure |3a[ alpha — 0.6 achieves the 
best accuracy on the validation set for the Netflix dataset, 
and a = 0.4 is selected to employ in the final evaluation of 
CBMF on the MovieLens dataset (figure [3b|. 
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Figure 2: A comparison between Biased Matrix Factoriza- 
tion (BMF) and our proposed Clustering-based Matrix Fac- 
torization (CBMF). 



n' = 500 are found as the best choices of ml and n' for the 
Netflix dataset, and ml = 500 and nl = 50 for the Movie- 
Lens 10k dataset (table [2|. Again, a validation set is used 
to determine this numbers. It seems that in both cases less 
number of clusters for users and more number of clusters for 
items are preferred in the model. 

For very large selection of ml and n', or when ml — > m 
and n' — > n the final predictor accuracy converge to the 
Biased Matrix Factorization's accuracy as expected. For 
small selection of ml and nl, CBMF does not improve the 
Biased Matrix Factorization. 



4.2 Number of Clusters 

Tables [l] and [2] show the effect of changing in the number 
of clusters (m',n') on the accuracy of final predictor func- 
tion. A validation set can be used to determine the best 
selection of ml and nl. Obviously, there is no need to test 
all the combinations of ml and n' . We start testing with 
four combinations of ml and nl from a small to a very large 
initialization of them. We then employ the best combination 
of ml and nl to limit our searching space. This process can 
be repeated until finding the ml ', and nl that minimize the 
predictions' RMSE. As is shown in table [l] ml = 6000 and 



4.3 Adding Contents 

MovieLenslOOk dataset contains some profiles for its items. 
It provides the name of the movies, the genre that the movies 
belong to, and etc. Each movie may belong to more than 
one genre. It is expected that employing these contents will 
slightly improve the items' clustering quality. Thus, we em- 
ploy the genre information of movies beside the learned la- 
tent vectors to see how improving in the clustering quality 
will affect the accuracy of recommendations. Kmeans is ap- 
plied on the new feature space, and we run the CBMF model 
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Table 1: Applying CBMF on the Netflix dataset employing 
different number of clusters for users and items. The best 
choices for the number of clusters is determined using a vali- 
dation set. The numbers show the CBMF's RMSE results of 
a validation set on the Netflix dataset for different selection 



and 


ri. 












ri = 100 


ri = 500 


ri = 1000 


m' 


= 1000 


0.90392 


0.90377 


0.90374 


rri 


= 3000 


0.90394 


0.90372 


0.90372 


rri 


= 6000 


0.90397 


0.90370 


0.90371 


ml 


= 9000 


0.90393 


0.90376 


0.90383 



Table 2: The table shows the RMSE results of applying 
CBMF on a validation set from MovieLenzlOOk. rri and 
ri are changed for finding a combination of them with the 
minimum predictions' RMSE. 
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Figure 3: The accuracy of the proposed CBMF model ap- 
plying on the two datasets (Accuracies are calculated on a 
validation set.). It shows how changing a affects the final 
predictor's accuracy. 



again regarding the new clusters. As table|3]shows by adding 
this little information about the items, the prediction's ac- 
curacy improves to 0.90482. It shows that improving in the 
quality of clustering will result better accuracy. 

4.4 Complexity 

Table [4] shows the complexity of our proposed model, 
biased matrix factorization, and three neighborhood-aware 
matrix factorization models proposed by [8], and [6]. Biased 
matrix factorization, ASVD, and the factorized neighbor- 
hood aware model by [5J have no preprocessing but the 18] 's 
proposed model add a preprocessing complexity to find the 
most similar users and items. R(u) is the set of items for 
which their ratings by u are available, N(u) contains all 
items for which u provided an implicit preference, and like- 
wise, R(i) denotes the set of users who rated item i. Thus, 
~^2 U \ R(u)\ will be the number of known rating values in ma- 
trix R. 

The preprocessing of our proposed algorithm includes clus- 
tering on the learned latent vectors. Hence, the complexity 
of our pre-processing depends on the clustering algorithm 
that one may uses. For instance, the complexity of Kmeans 
is 0(n dk+1 log(ri)). In the learning process our algorithm 



has same complexity as the biased matrix factorization. Em- 
ploying the fast Kmeans provided by RapidMinerQ package, 
it takes less than two hours to compute the clusters for the 
Netflix dataset. 



5. RELATED WORKS 

[2] employs same idea to add the effect of clusters in biased 
matrix factorization. However, they constraint themselves 
to use the advantage of clusters' biases only. The predictor 
function that they use is as follows: 



fui = Pu-qi + MC G(i) + b u ,c G(i) + bi 

where Cou) is a function which returns for any item i its 
group, /Uc G(4) is the average ratings in Cc(i), and b«,c G(j ) 
is the bias of user u for the group of items Cd ■ It is hard 
to compare these two models as the different setups and 
parameters. However, because [2] does not consider how 
categories share interests, it is expectable that our method 
works better. For instance, running our algorithm using 
the clusters' biases only achieves the RMSE of 0.907765 on 
the MovieLenslOOk dataset which is not better than our 
method's RMSE using same parameters. 

18] presents a neighborhood-aware matrix factorization which 
includes neighborhood information in the basic matrix fac- 
torization. Their proposed algorithm computes three pre- 
dictions for every user-item pair: a prediction r^[ F based 
on basic matrix factorization; a prediction r^| er based on 
an user-neighborhood model; and finally a prediction r^ m 

1 http:/ /rapid-i.com/ 



Table 3: Applying CBMF on the MovieLenslOOk dataset 
with and without adding the contents. Adding genre infor- 
mation of movies improves the clustering quality and conse- 
quently the predictions' qualities. 

CBMF CBMF 
BMF (without contents) (with contents) 
RMSE 0.90949 0.90667 0.90482 



Table 4: Complexity of proposed model against Basic Ma- 
trix Factorization, Neighborhood Aware Matrix Factoriza- 
tion (NAMF-1) by [8], Asymmetric-SVD model and inte- 
grated neighborhood aware matrix factorization (NAMF) by 



Methods 


Complexity 


Basic MF 


0(k.J2JR(u)\) 


NAMF by M 


0(k.(J.J2 u \R(u)\ 2 + J.EJflWI 2 )) 


ASVD by p5] 


o(j: u \r(u)\(\r(u)\ + \n(u)\)) 


NAMF by |] 


0(k.(J2JR(i)\ 2 + J2 u (\R(u)\ + N(u))) 


CBMF 


0(k.J2 u \R(u)\) 



which is based on an item-neighborhood model. A combi- 
nation of these three predictions is the final prediction of 
this algorithm. The rating prediction r^f er is computed as 
follows: 



As another model, he models these my an dj in a fac- 
torized approach [5]. He factors item- item relationships by 
associating each item i with three vectors: ft,!,,;;, GK* . It 
represents the similarity weight between items as qf x i, an d 
similarly they impose the structure qTyj a s the similarity 
bias. The proposed item-item model is as follows: 



f U i = A* + K + bi + \R(u)\ 1/2 ^2 (f uj -b uj )qfxi 

jen(u) 

j£N(u) 

where R(u) contains all the items who have rated by user 
it, and N(u) contains all items for which u provided an im- 
plicit preference. He does same modeling for the users. Fi- 
nally, [5] proposes a fusion of item-item and user-user models 
in a single model to have the advantage of both sides' infor- 
mation (we call the final method as the factorized model in 
our comparison). 

On the other hand, Xu in [9] employ the clusters in differ- 
ent way to improve the accuracy of predictions. They cluster 
items and users into subspaces where each user or item can 
belong to more than one cluster. Their main idea is to apply 
some collaborative filtering algorithm in each subgroup and 
then merge the prediction results together. 

[l] presents a complete survey on neighborhood-based rec- 
ommendation methods that covers many other extensions 
on matrix factorization. However, the accuracy of current 
neighborhood aware models are sensitive to the considered 
number of neighbors. Increasing in the number of neighbors 
on those models will cause more learning time complexity. 



Euser . 

T - = 

uz \ ' n user 

where Uj(u) denotes the set of J users with highest corre- 
lation to user u. These correlations are reached by counting 
the number of co-rating of users ( co-being-rated of items). 
The paper has a similar approach in computing r^* em . It 
is mentioned by [8] that using this neighborhood-aware im- 
proves the accuracy and even learning convergence. How- 
ever, it is still sensitive to the choice of J. By increasing 
J the processing time of learning in each epoch will be in- 
creased linearly. 

Koren in [H] presents a neighborhood aware recommender 
model name "Asymmetric-SVD". It includes following pre- 
dictor function inside its recommender model: 



r U i = M + b u + bi + \R(u)\ 1/2 ^2 {?uj ~ b uj )w t j 
+ \N(u)\- 1/2 J2 c ^ 

j£N(u) 

where the weight from item j to item i is denoted by wy , 
and dj is an added offset between these two items. 



6. CONCLUSIONS AND FUTURE WORKS 

In this paper we propose an extension of matrix factor- 
ization which adds more general neighborhood information 
on the recommendation model. Users and items are clus- 
tered in different categories to see how these categories share 
preferences. This is a complement for the current neighbor- 
hood aware matrix factorization models which employ the 
direct neighborhood information of users and items. The 
proposed model is tested on two well-known recommenda- 
tion systems datasets: Movielens and Netflix. Our exper- 
iment shows that using the novel deduced information of 
these categories improves the accuracy of recommendations. 
The proposed model's accuracy is significantly better than 
"WgtNgbr" and "CorNgbr" models reported in |5| and com- 
petitive (if not better) for more accurate models such as 
Asymmetric-SVD [5] when they employ a fewer number of 
neighbors. Also, it is shown that improving the quality of 
clusters affects the quality of recommendations. 

As we use deeper similarities of items and more general 
interests of users, it is expected that blending our method 
with current neighborhood aware recommender models will 
improve them. In future work, we are going to add this 
information of categories into the interesting integrated fac- 
torized neighborhood-aware model proposed in [5]. Also, 
we are going to investigate how helpful our algorithm is for 
cold-start users. 
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